-
Notifications
You must be signed in to change notification settings - Fork 2.7k
fix: change consul event module to shared dict #12773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| local function persist_all_services_to_shm() | ||
| if not consul_dict then | ||
| return | ||
| end | ||
|
|
||
| local data, err = core.json.encode(all_services) | ||
| if not data then | ||
| log.error("failed to encode consul services for shared dict: ", err) | ||
| return | ||
| end | ||
|
|
||
| local function discovery_consul_callback(data, event, source, pid) | ||
| all_services = data | ||
| log.notice("update local variable all_services, event is: ", event, | ||
| "source: ", source, "server pid:", pid, | ||
| ", all services: ", json_delay_encode(all_services, true)) | ||
| local ok, set_err = consul_dict:set(consul_dict_services_key, data) | ||
| if not ok then | ||
| log.error("failed to store consul services in shared dict: ", set_err) | ||
| return | ||
| end | ||
| end | ||
|
|
||
|
|
||
| local function sync_all_services_from_shm(force_log) | ||
| if not consul_dict then | ||
| return | ||
| end | ||
|
|
||
| local data = consul_dict:get(consul_dict_services_key) | ||
| if not data then | ||
| if force_log then | ||
| log.info("consul shared dict services empty") | ||
| end | ||
| return | ||
| end | ||
|
|
||
| local decoded, err = core.json.decode(data) | ||
| if not decoded then | ||
| log.error("failed to decode consul services from shared dict: ", err) | ||
| return | ||
| end | ||
|
|
||
| all_services = decoded | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both functions writing to and reading from shared memory through all_services seem very dangerous. It may lead to re-writing the data just read from shared memory instead of getting the latest data from consul due to unexpected execution order. It is recommended to remove the use of the all_services variable, simplifying it so that only a privileged agent starts a timer to periodically fetch data from consul, while other workers only load data from shared memory.
Description
Configuring Consul service discovery in APISIX and restarting it while APISIX is continuously receiving traffic will result in frequent 503 errors.
In the design of Consul service discovery, only worker 0 directly pulls nodes from Consul and updates data. Other workers rely on
events:registerto receive broadcasts. If, when APISIX restarts, any worker has not yet completedevents:register, but the service list data broadcast has already been sent, these workers will miss receiving data, and requests sent to that worker will return a 503 error.This problem can be avoided by changing the event module to shared dict.
Which issue(s) this PR fixes:
Fixes #12398
Checklist